Skip to content

Comments

Cuda oom rebase#428

Open
mjh1 wants to merge 7 commits intomainfrom
cuda-oom-rebase
Open

Cuda oom rebase#428
mjh1 wants to merge 7 commits intomainfrom
cuda-oom-rebase

Conversation

@mjh1
Copy link
Contributor

@mjh1 mjh1 commented Feb 10, 2026

This is a reopened version of #403 which got auto-closed unintentionally. This new one is now rebased against main.

@mjh1 mjh1 mentioned this pull request Feb 10, 2026
5 tasks
emranemran and others added 5 commits February 10, 2026 10:43
- Add cleanup() to Pipeline base class (gc.collect + empty_cache)
- Call cleanup on pipeline unload to free GPU memory
- Log GPU memory before/after unload and after load
- Clear pinned buffer cache on frame processor stop
- Release pipeline reference on pipeline processor stop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: emranemran <emran.mah@gmail.com>
Add unload callback mechanism so PipelineManager can notify
FrameProcessors to release pipeline references before calling
cleanup(). This allows gc.collect() + empty_cache() to actually
free GPU memory during pipeline switches, not just on session end.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: emranemran <emran.mah@gmail.com>
Signed-off-by: Varshith B <varshith15@gmail.com>
Signed-off-by: Varshith B <varshith15@gmail.com>
Signed-off-by: Varshith B <varshith15@gmail.com>
Signed-off-by: Varshith B <varshith15@gmail.com>
Signed-off-by: Varshith B <varshith15@gmail.com>
Copy link
Collaborator

@leszko leszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments. In general, if it solves the OOM issue, than it's great 🎉 In that case, I'll hold off with the Subprocess work.

One general comment is that I think that there are too many changes in this PR and I think that not all of them are needed. I guess the only really needed change is the cleanup() function in interface.py and it's usage in pipeline_manager.py. Maybe also these del frame. All the rest should not be needed because it should be handled by GC.

Comment on lines +85 to +101
if hasattr(self, "components"):
try:
components_dict = getattr(self.components, "_components", None)
if components_dict is not None:
for name in list(components_dict.keys()):
del components_dict[name]
del self.components
except Exception:
pass

if hasattr(self, "state"):
try:
if hasattr(self.state, "values"):
self.state.values.clear()
del self.state
except Exception:
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this part specific to some pipelines. Wouldn't it be possible to just iterate over all fields (like below) and delete rather than assuming that some pipelines have components and state fields?

Comment on lines +205 to +209
def release_pipeline(self):
"""Release pipeline reference to allow GC of GPU resources."""
logger.info(f"Releasing pipeline reference for {self.pipeline_id}")
self.pipeline = None

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be needed, because we set self.pipeline = None in stop() and we always call stop() and release_pipeline() together.

Comment on lines +200 to +201
# Release pipeline reference to allow GC of GPU resources
self.pipeline = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not need to explicitly set self.pipeline = None. The whole PipelineProcessor object should be cleaned after the stream is stopped.

I think the flow should be:

  1. We start the stream, there are the following objects created: VideoTrack/CloudTrack, FrameProcessor, and (a few) PipelineProcessor objects.
  2. We stop the stream, all these objects are deleted
    So effectively, the only object that is preserved between stream start/stop, is PipelineManager, because we want to avoid reloading the pipelines if they are already loaded. All the rest should be cleaned up in between stream start/stop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants